For our Final Tutorial, I, Annie Donahue, and Shay Prinz, will be working together. We are planning on working with 911 call data from the years 2019 and 2020, and possibly looking backwards to earlier years if it seems necessary, or if we have the time. We would like to manipulate this data to be able to see how different events/phenomena affect either peoples’ ability to call 911, or their willingness to.

  For example, we would like to see what the data can tell us about pre-COVID19 911 calls, vs. post COVID19, by comparing the data February of 2020 with May of 2020, and then using February and May 2019 as a control factor to see if anything changed. We were also thinking about looking at the 911 call data during times of civil unrest where there was a prominent anti-police sentiment in the U.S. Right now, for example, it would be interesting to see if less people are calling the police as protests calling for police abolishment continue across the country. This data can tell us a lot about the current state of New Orleans as a whole as we will be able to see both crime reporting data and 911 calls related to health issues. These datasets encompass a lot of information, and it may be difficult to filter it in a way that will be easier and more digestible so as to see the true effects of the various events we are trying to center our analysis around.

  A specific challenge Shay came across when transforming the data was converting from string to datetime. Due to how large this dataset is, it was taking over 10 minutes to complete the conversion. This is because pandas uses dateutil.parser.parse for parsing the strings when it has a non-default format or when no format string is supplied. I was able to speed up the process a bit by adding the parameter "infer_datetime_format=True".

  We are planning on meeting once a week as a starting point, so as to gauge how much we can get done in one sitting, and make sure we are on a good pace to be able to complete our tutorial in the semester. If need be, we will move up to meeting multiple times a week, especially as we get closer to the end of the semester. Mostly, we will be completing our work on GitHub, and then using the weekly meeting to make a timeline for the next week, and set goals and checkpoints to accomplish before the next time we meet, as well as check in on each others progress and make sure our code and our analyses are compatible.

  We would like to plan at least one big goal to accomplish for each week, which is a hard deadline for us both, as well as a stretch goal that we would ideally get done before the next week, but could be moved up to the next week if it became clear that we didn’t have enough time during that week. This is so as to keep us on a steady track, but also work with both of our busy schedules is knowing that there are some weeks where we may have more time and be able to get larger bulk of things done, but even in our busier weeks, we would have to get the set goal done, keeping us on a steady movement towards completing the project. Ideally, our weeks will work around Thursdays, as since we are in the service learning portion of the class, we would like to be able to check in during those times with updates to our service learning partner and our class.

This dataset reflects incidents that have been reported to the New Orleans Police Department in 2019. Data is provided by Orleans Parish Communication District (OPCD), the administrative office of 9-1-1 for the City of New Orleans.

In [2]:
import pandas as pd
#set 'Beat' column dtype to object to avoid mixed dtypes in column
Calls2019_df = pd.read_csv("../Calls_for_Service_2019.csv",low_memory=False)
In [3]:
#Tidy data by dropping columns that won't be used (NOPD_Item, MapX, MapY, TimeClosed, Beat, Location)
del Calls2019_df['NOPD_Item']
del Calls2019_df['MapX']
del Calls2019_df['MapY']
del Calls2019_df['TimeClosed']
del Calls2019_df['Beat']
In [4]:
Calls2019_df[['Latitude','Longitude']] = Calls2019_df.Location.str.split(",",expand=True) 
#Remove location column
del Calls2019_df['Location']
#Strip parentheses from left and right side
Calls2019_df['Latitude'] = Calls2019_df['Latitude'].replace(to_replace='\(', value="", regex=True)
Calls2019_df['Longitude'] = Calls2019_df['Longitude'].replace(to_replace='\)', value="", regex=True)
In [5]:
#Calls2019_df.dtypes
#Convert TimeCreate, TimeDispatch, and TimeArrival to datetime
Calls2019_df['TimeCreate'] = pd.to_datetime(Calls2019_df['TimeCreate'],infer_datetime_format=True)
Calls2019_df['TimeDispatch'] = pd.to_datetime(Calls2019_df['TimeDispatch'],infer_datetime_format=True)
Calls2019_df['TimeArrival'] = pd.to_datetime(Calls2019_df['TimeArrival'],infer_datetime_format=True)
#Remove all rows with years that are not 2019
Calls2019_df = Calls2019_df[(Calls2019_df['TimeCreate'].dt.year == 2019)]
#replace missing data
Calls2019_df
Out[5]:
Type TypeText Priority TimeCreate TimeDispatch TimeArrival Disposition DispositionText InitialType InitialTypeText InitialPriority SelfInitiated BLOCK_ADDRESS Zip PoliceDistrict Latitude Longitude
0 58 RETURN FOR ADDITIONAL INFO 1I 2019-03-28 13:11:19 2019-03-28 13:11:19 NaT NAT Necessary Action Taken 58 RETURN FOR ADDITIONAL INFO 1I N N Galvez St & Conti St 70119 1 29.96455831 -90.07992969
1 62L BURGLAR ALARM, LOCAL 2D 2019-03-28 17:02:07 2019-03-28 17:05:16 NaT NAT Necessary Action Taken 62L BURGLAR ALARM, LOCAL 2D N 020XX Canal St 70112 1 29.9607251 -90.08135199
2 22A AREA CHECK 1K 2019-03-08 15:50:25 2019-03-08 15:50:25 2019-03-08 15:50:30 NAT Necessary Action Taken 22A AREA CHECK 1K N Catina St & Robert E Lee Blvd 70124 3 30.02002615 -90.1125062
3 21 COMPLAINT OTHER 1J 2019-03-28 18:52:04 NaT NaT VOI VOID 63 PROWLER 2E N 001XX N Dorgenois St 70119 1 29.96505415 -90.08533671
4 62C SIMPLE BURGLARY VEHICLE 1G 2019-03-28 15:54:27 2019-03-28 16:18:37 NaT RTF REPORT TO FOLLOW 62C SIMPLE BURGLARY VEHICLE 1G N 005XX Soraparu St 70130 6 29.92354229 -90.07466234
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
485281 62A BURGLAR ALARM, SILENT 1A 2019-12-22 22:43:51 2019-12-22 22:45:09 NaT DUP DUPLICATE 62A BURGLAR ALARM, SILENT 2E N 033XX Law St 70117 5 29.98221273 -90.03670531
485282 TS TRAFFIC STOP 1J 2019-12-29 00:47:57 2019-12-29 00:47:57 2019-12-29 00:47:57 NAT Necessary Action Taken TS TRAFFIC STOP 1J Y Tulane Ave & S Jefferson Davis Pkwy 70119 1 29.96514325 -90.09811168
485283 107 SUSPICIOUS PERSON 2C 2019-12-28 23:10:58 2019-12-28 23:12:05 2019-12-28 23:18:32 NAT Necessary Action Taken 107 SUSPICIOUS PERSON 2C N 024XX N Dorgenois St 70117 5 29.98290759 -90.05262703
485284 62A BURGLAR ALARM, SILENT 1A 2019-12-24 14:34:55 2019-12-24 17:02:19 2019-12-24 17:23:40 NAT Necessary Action Taken 62A BURGLAR ALARM, SILENT 2E N 030XX Grand Route St John 70119 1 29.97963871 -90.08212947
485285 103 DISTURBANCE (OTHER) 2A 2019-12-27 18:16:32 2019-12-27 18:18:30 2019-12-27 18:22:56 GOA GONE ON ARRIVAL 103 DISTURBANCE (OTHER) 1C N 027XX Monticello Ave 70118 2 29.96711533 -90.12825085

481638 rows × 17 columns

This dataset reflects incidents that have been reported to the New Orleans Police Department in 2020. Data is provided by Orleans Parish Communication District (OPCD), the administrative office of 9-1-1 for the City of New Orleans.

In [6]:
Calls2020_df = pd.read_csv("../Call_for_Service_2020.csv")
In [7]:
#Tidy data by dropping columns that won't be used (NOPD_Item, MapX, MapY, TimeClosed, Beat, Location)
del Calls2020_df['NOPD_Item']
del Calls2020_df['MapX']
del Calls2020_df['MapY']
del Calls2020_df['TimeClosed']
del Calls2020_df['Beat']
In [8]:
#Clean up location column to get Latitude and Longitude columns
Calls2020_df['Location'] = Calls2020_df['Location'].replace(to_replace='POINT ', value="", regex=True)
Calls2020_df[['Longitude','Latitude']] = Calls2020_df.Location.str.split(" ",expand=True)
#Remove location column
del Calls2020_df['Location']
#Strip parentheses from left and right side
Calls2020_df['Longitude'] = Calls2020_df['Longitude'].replace(to_replace='\(', value="", regex=True)
Calls2020_df['Latitude'] = Calls2020_df['Latitude'].replace(to_replace='\)', value="", regex=True)
In [9]:
#Calls2020_df.dtypes
#Convert TimeCreate, TimeDispatch, and TimeArrive to datetime
Calls2020_df['TimeCreate'] = pd.to_datetime(Calls2020_df['TimeCreate'],infer_datetime_format=True)
Calls2020_df['TimeDispatch'] = pd.to_datetime(Calls2020_df['TimeDispatch'],infer_datetime_format=True)
Calls2020_df['TimeArrive'] = pd.to_datetime(Calls2020_df['TimeArrive'],infer_datetime_format=True)

#replace missing data
Calls2020_df
Out[9]:
Type TypeText Priority InitialType InitialTypeText InitialPriority TimeCreate TimeDispatch TimeArrive Disposition DispositionText SelfInitiated BLOCK_ADDRESS Zip PoliceDistrict Longitude Latitude
0 94F FIREWORKS 1A 103 DISTURBANCE (OTHER) 1C 2020-01-01 00:00:34 2020-01-01 02:34:18 NaT NAT Necessary Action Taken N 001XX Blk Riviera Ave 70122 3 -90.0808922 30.0086791
1 21 COMPLAINT OTHER 1J 21 COMPLAINT OTHER 1J 2020-01-01 00:00:42 2020-01-01 00:00:42 2020-01-01 00:00:42 NAT Necessary Action Taken Y 034XX Broadway St 70125 2 -90.10840522 29.95996774
2 94F FIREWORKS 1A 94F FIREWORKS 2J 2020-01-01 00:01:05 2020-01-01 01:49:42 2020-01-01 02:08:17 NAT Necessary Action Taken N 026XX Banks St 70119 1 -90.08872937 29.96246347
3 94 DISCHARGING FIREARM 2D 94 DISCHARGING FIREARM 2D 2020-01-01 00:02:50 2020-01-01 00:02:54 2020-01-01 00:09:13 GOA GONE ON ARRIVAL N Kerlerec St & N Robertson St 70116 1 -90.0669267 29.96960271
4 94F FIREWORKS 1A 94F FIREWORKS 2J 2020-01-01 00:03:46 2020-01-01 00:34:36 NaT NAT Necessary Action Taken N 053XX Memphis St 70124 3 -90.10813674 29.98517428
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
309526 62A BURGLAR ALARM, SILENT 2E 62A BURGLAR ALARM, SILENT 2E 2020-09-16 06:55:41 2020-09-16 06:57:24 2020-09-16 07:01:49 NAT Necessary Action Taken N 027XX Verbena St 70122 3 -90.04649337 30.00041767
309527 22A AREA CHECK 1K 22A AREA CHECK 1K 2020-09-16 15:29:32 2020-09-16 15:29:32 2020-09-16 15:29:47 NAT Necessary Action Taken N 007XX Canal St 70130 8 -90.06958596 29.95385694
309528 21 COMPLAINT OTHER 2A 911 SILENT 911 CALL 2F 2020-09-16 20:12:48 2020-09-16 20:14:57 2020-09-16 20:17:39 GOA GONE ON ARRIVAL N S Carrollton Ave & Tulane Ave 70119 1 -90.10642935 29.96893162
309529 22A AREA CHECK 1K 22A AREA CHECK 1K 2020-09-16 05:25:06 2020-09-16 05:25:06 2020-09-16 05:25:06 NAT Necessary Action Taken Y Basin St & Crozat St 70112 1 -90.07017072 29.96079068
309530 22A AREA CHECK 1K 22A AREA CHECK 1K 2020-09-16 14:33:24 2020-09-16 14:33:24 2020-09-16 14:33:39 NAT Necessary Action Taken N Baudin St & S Genois St 70119 1 -90.09910158 29.96789002

309531 rows × 17 columns

A challenge I came across when transforming the data was converting from string to datetime. Due to how large this dataset is, it was taking over 10 minutes to complete the conversion. This is because pandas uses dateutil.parser.parse for parsing the strings when it has a non-default format or when no format string is supplied. I was able to speed up the process a bit by adding the parameter "infer_datetime_format=True".

Graph to show number of 911 calls per month for 2019 and 2020

In [10]:
Calls2019_df = Calls2019_df.copy()
Calls2019_df['Year'] = Calls2019_df['TimeCreate'].dt.year
Calls2019_df['Month'] = Calls2019_df['TimeCreate'].dt.month
CallsbyMonth2019 = Calls2019_df[Calls2019_df['Month'] < 9]

Calls2020_df = Calls2020_df.copy()
Calls2020_df['Year'] = Calls2020_df['TimeCreate'].dt.year
Calls2020_df['Month'] = Calls2020_df['TimeCreate'].dt.month
CallsbyMonth2020 = Calls2020_df[Calls2020_df['Month'] < 9]

CallsbyMonth = CallsbyMonth2019.append(CallsbyMonth2020)
CallsbyMonth = CallsbyMonth.reset_index()
CallsbyMonth["Month"] = CallsbyMonth["Month"].map({
       1: "Jan",
       2: "Feb",
       3: "March",
       4: "Apr",
       5: "May",
       6: "June",
       7: "Jul",
       8: "Aug"
})

#result
In [11]:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np

result = CallsbyMonth.pivot_table(index='Month', 
                        columns='Year', values='Type', aggfunc='count').apply(lambda x:100*x / float(x.sum())).round(2)

monthOrder = ['Jan', 'Feb', 'March', 'Apr', 'May', 'June', 'Jul', 'Aug']
ax = result.loc[monthOrder].plot(kind="bar", legend=True, 
                                 title ="Percent of Calls by Month", figsize=(20, 10), fontsize=16)
plt.xlabel("Month")
plt.ylabel("Percent")
plt.show()

This graph shows the percent of calls made each month in comparison to the total calls made from January to August in both 2019 and 2020. As you can see, the highest proportion of calls for 2020 were made in January and February. This was before quarantine so we cannot say quarantine and an increased number of calls are correlated, which was what we had predicted. For 2019, the highest proportion of calls was made in the month of May.

Map of Locations of 911 Calls in 2019

In [12]:
!pip install folium
Collecting folium
  Downloading folium-0.11.0-py2.py3-none-any.whl (93 kB)
     |████████████████████████████████| 93 kB 782 kB/s eta 0:00:011
Requirement already satisfied: numpy in /opt/conda/lib/python3.8/site-packages (from folium) (1.19.1)
Requirement already satisfied: requests in /opt/conda/lib/python3.8/site-packages (from folium) (2.24.0)
Collecting branca>=0.3.0
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Requirement already satisfied: jinja2>=2.9 in /opt/conda/lib/python3.8/site-packages (from folium) (2.11.2)
Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests->folium) (2.10)
Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/lib/python3.8/site-packages (from requests->folium) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.8/site-packages (from requests->folium) (2020.6.20)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.8/site-packages (from requests->folium) (1.25.10)
Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/lib/python3.8/site-packages (from jinja2>=2.9->folium) (1.1.1)
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0
In [13]:
import folium
In [14]:
locations = CallsbyMonth2019[['Latitude', 'Longitude']].dropna()
locationlist = locations.values.tolist()
len(locationlist)
Out[14]:
322853
In [15]:
from folium.plugins import FastMarkerCluster

NOLA_location = [29.951065, -90.071533 ]

m = folium.Map(location=NOLA_location,
               zoom_start=15,
               tiles='openstreetmap')

m.add_child(FastMarkerCluster(locationlist, name='2019'))
Out[15]:
Make this Notebook Trusted to load map: File -> Trust Notebook

This visualization shows the location of the caller (if available in the dataset) who placed the 911 call in 2019.

Map of Locations of 911 Calls in 2020

In [16]:
locations1 = CallsbyMonth2020[['Latitude', 'Longitude']].dropna()
locationlist1 = locations1.values.tolist()
len(locationlist1)
Out[16]:
290183
In [17]:
m1 = folium.Map(location=NOLA_location,
               zoom_start=15,
               tiles='openstreetmap')

m1.add_child(FastMarkerCluster(locationlist1, name='2020'))
m1
Out[17]:
Make this Notebook Trusted to load map: File -> Trust Notebook

This visualization shows the location of the caller (if available in the dataset) who placed the 911 call in 2020. There are so many calls placed per year, if you zoom in, you will see more exact locations.

Top 10 Types of 911 Calls in April 2019

In [22]:
April2019 = CallsbyMonth2019.loc[(CallsbyMonth2019['Month'] == 4)& (CallsbyMonth2019['TypeText'] != 'AREA CHECK') & (CallsbyMonth2019['TypeText'] != 'BUSINESS CHECK')]
April2019['TypeText'].value_counts()[:10].plot.pie(figsize=(50,30), textprops={'fontsize': 44})
Out[22]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f5424ae9e80>

This visualization shows the type of 911 calls made in April 2019. We removed the business checks and area checks because they were irrelevant to the types of information we are looking are as they don't describe much. In comparing this visualization to the next one (Types of 911 Calls in April 2020, even just at first glance we see that there were much less Traffic Incidents in 2020, and that there is a whole new category for Public Gathering which takes up a large portion of the calls. We can presume that these two differences both have to do with the COVID 19 pandemic.)

Top 10 Types of 911 Calls in April 2020

In [23]:
April2020 = CallsbyMonth2020.loc[(CallsbyMonth2020['Month'] == 4) & (CallsbyMonth2020['TypeText'] != 'AREA CHECK') & (CallsbyMonth2020['TypeText'] != 'BUSINESS CHECK')]
April2020['TypeText'].value_counts()[:10].plot.pie(figsize=(50,30), textprops={'fontsize': 44})
Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f5424a47460>

Notice the decrease in traffic incidents, new phone calls placed about public gatherings, and increase in domestic disturbance!